Optimal experiment design revisited: fair, precise and minimal tomography. 
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Given an experimental set-up and a fixed number of measurements, how should one take data in 
order to optimally reconstruct the state of a quantum system? The problem of optimal experiment 
design (OED) for quantum state tomography was first broached by Kosut et al. [J. Here we 
provide efficient numerical algorithms for finding the optimal design, and analytic results for the 
case of 'minimal tomography'. We also introduce the average OED, which is independent of the state 
to be reconstructed, and the optimal design for tomography (ODT), which minimizes tomographic 
bias. We find that these two designs are generally similar. Monte-Carlo simulations confirm the 
utility of our results for qubits. Finally, we adapt our approach to deal with constrained techniques 
such as maximum likelihood estimation. We find that these are less amenable to optimization than 
cruder reconstruction methods, such as linear inversion. 

PACS numbers: 03.65.Wj, 03.65.Ta, 42.50.Dv 



I. INTRODUCTION 

Quantum state tomography [l], H H, H, H| is an in- 
dispensable tool in quantum information processing, be- 
ingessential for the characterization of quantum sources 
[10, 1] , gates (1 G3 , processes [U El El and mea- 
surements [Tfjl llq . The reconstruction of quantum states 
is, however, essentially a classical problem — that of es- 
timating the parameters of a density matrix from a data 
set. The classical theory of multiple parameter estima- 
tion is well developed 0, EE El (H (H HI HE H, 
HE H3 > an d so a considerable amount is known about the 
precision that can be achieved with quantum tomography 

In, m, m, no, nn . 

In this paper we are concerned with designing an ex- 
periment so as to optimize this precision. Choosing the 
right set of measurements is of paramount importance, 
and the optimal measurements for tomography are now 
known IE 11 IE IE 11 11 . But it is not always 
possible to implement them: more often than not, tech- 
nical constraints permit only a non-ideal set of measure- 
ments H- Given such a set, and a finite time in which 
to acquire data, one encounters the question "How much 
time should be spent on each measurement, so as to per- 
form the best tomographic inversion?" . This is the prob- 
lem of optimal experiment design — OED for short. We 
will sec that a judicious design can significantly improve 
the performance of tomographic reconstruction. Rather 
paradoxically, however, the OED generally depends on 
the state we wish to reconstruct, so that one cannot find 
the OED if one is completely ignorant of the quantum 
state. In this paper we introduce two alternative ap- 
proaches to experiment design that do not suffer from 



this state dependence. 

The paper is structured as follows. In Section |TT] we 
review the theory of multi-parameter estimation, and we 
introduce the notation to be used subsequently In Sec- 
tion [TTT] we will show how to find the OED quickly using 
standard numerical techniques. We then move on in Sec- 
tion |IV] to derive an analytic formula for the OED, which 
holds in the case that the quantum state is not overdeter- 
mined by the available data. Sections [V] and I VII address 
the problem of state tomography when no prior knowl- 
edge of the true quantum state exists. That is, when one 
really has no idea at all what quantum state we expect 
to find. On the one hand, we may decide that we should 
design our experiment so that, on average, whatever the 
true state, the tomography is as precise as possible. This 
we call the average OED. On the other hand, we might 
want our tomography to be as 'fair' as possible, so that 
the precision is as close as possible to being independent 
of the true state. We call this the optimal design for to- 
mography — ODT. We show that — fortunately — the 
tradeoff between precision and fairness is rather small. In 
Section I VIII we present the results of Monte-Carlo sim- 
ulations to corroborate our predictions. Finally in Sec- 
tion [VlTT] we consider adapting our results to the case of 
constrained estimators, such as maximum likelihood es- 
timation: we conclude that our optimizations still apply 
for such estimators, even though the improvement is less 
marked than for unconstrained tomography. Section IIXI 
wraps up the paper with some concluding remarks. 



II. BACKGROUND 



A. Bloch representation 
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We suppose that we are given an ensemble of iden- 
tical iV-dimensional quantum systems, all of which are 



2 



prepared in the same way. Quantum state tomography 
amounts to estimating 2N 2 real numbers comprising the 
elements of p, the complex-valued NxN density matrix 
describing the ensemble. But not all of these numbers 
are independent of one another, because p must always 
be Hermitian, with unit trace and positive eigenvalues 
Q • The first two conditions reduce the number of inde- 
pendent real parameters in the density matrix down to 
TV 2 — 1. An equivalent description of the quantum sys- 
tem is then given by the Block vector r, which is a real 
column vector whose N 2 — 1 elements uniquely determine 
p, according to the relation 



P 



1 

N 



I + r.cr. 



(1) 



where I is the NxN identity matrix. Here cr is an TV 2 — 1 
dimensional vector whose elements are NxN matrices 
that form an orthonormal basis for the space of traceless 
Hermitian operators [!, H(| • That is to say 



trfo} = 0, 



and 



tr-JcrjCTfc} = 5 



(2) 



Any set of matrices satisfying these conditions will suffice 
for constructing cr. For N = 2, the three elements of cr 
are conventionally taken to be the familiar Pauli matri- 
ces, and then r is the standard three dimensional vector 
that describes the state of a qubit in the Bloch sphere Q . 
The vector r is known as the Bloch or Fano representa- 
tion of p; its usefulness becomes clear when we consider 
the way it enters into the calculation of measurement 
statistics. 

We will perform our tomographic experiment by send- 
ing multiple copies of the state p into our apparatus. 
The apparatus can be configured in one of M different 
ways, so that there are M different possible measure- 
ments we can make (for instance, we could measure the 
three Cartesian components of a spin, and M = 3 in this 
case). Let 7 denote the particular measurement we are 
making. For each 7, there are n 7 different possible mea- 



surement outcomes (in our spin example, 



2S+ 1 



for all the measurements, where S is the total spin quan- 
tum number). Associated to each measurement out- 
come a is a so-called positive operator-valued measure 
element (POVM element) Q, which is a Hermitian, pos- 
itive NxN matrix n Q7 , such that the probability p ai 
of obtaining the outcome a in configuration 7 is given 
by 58] 



p ai = tr {n a7 p} . 



(3) 



The POVM elements contain the physics of our tomog- 
raphy set-up. All noise and detector inefficiency can be 
incorporated into them, so that once they are fixed, the 
only issues we must deal with are statistical. Like the 
density matrix, they are Hermitian, and can also be writ- 
ten in the Bloch representation: 



n, 



(4) 



where c Q7 is a real number, and real (N 2 - 1)- 

dimensional vector. For each measurement configuration 
7, the probabilities p Q7 must sum to unity, and this im- 
poses the sum constraint Xm=i ^a-y = I- ^ n the Bloch 
representation, we must have, accordingly, 



and 



2_j aa ~> 



(5) 



for all values of 7. Now, substituting Eq. (fj| into Eq. © 
we obtain the simple affine relation 



Pa~i 



(6) 



which connects the measurement statistics with the 
Bloch vector. The probabilities can be collected into an 
ntot-dimensional vector p, where 



"tot 



M 
7=1 



(7) 



is the total number of outcomes our experiment can pro- 
duce, across all measurement configurations. Then the 
statistics are summarized by the matrix equation 



P 



Ar, 



(8) 



where c is a real vector whose elements are the c aj , and 
where A is the n to t X (N 2 — 1) matrix whose rows are 
given by the vectorized POVM elements, 



\ 



"-li 



» 21 



Q7 



l n M M 



(9) 



This notation uses no more matrix elements than nec- 
essary, unlike using p and n a7 in (J3j . As is often the 
case, the notational compactness of matrices and vectors 
will prove invaluable in teasing out the structure of the 
optimizations that follow. 



B. Cramer-Rao bound 

Suppose that we use a total of N to t copies of our quan- 
tum system (note that A^ot is different to n to t , the latter 



3 



being the total number of possible outcomes) . We mea- 
sure iV 7 times in each of the M configurations, so that 
we have 



A I 



E^ 



(10) 



At the end of the experiment, we are left with an iVtot- 
dimcnsional data vector n whose elements n ai are the 
number of times the outcome a was observed in config- 
uration 7, so that n aj — -^7- We must process n 
somehow to produce an estimate of the true state r. The 
probability of obtaining the data vector n is given (up to 
an unimportant combinatorial factor) by the likelihood 
function p, 



(11) 



We will see below that the sensitivity of this likelihood 
function to r, through Eq. ([6]), determines the precision 
of our tomographic inversion. The OED is the set of 
numbers {N 7 } which makes our estimate of r as precise 
as possible. 

Our approach to finding the OED follows that pre- 
sented in in which we seek to minimize the Cramer- 
Rao bound [Tt], [H, HH . This is a classical information- 
theoretic lower bound on the mean squared error in the 
estimate of r, which does not depend on the algorithm 
used to extract that estimate, as long as the estimate 
is unbiased (We will discuss certain biased estimators in 
Section IVIIip . The Cramer- Rao bound is defined in terms 
of the Fisher information matrix (59| 



Only the last term contributes (the other terms vanish 
due to the second condition in Eq. ([5]) [H). Finally, we 
are left with the expression 



F 



iV tot Pofy 

Off ' 



(16) 



The inverse of this matrix is a lower bound on the co- 
variance matrix of our reconstructed state, known as the 
Cramer-Rao bound. That is, if we estimate the state to 
be f, then the matrix 7V to tCov (r, r) — F^ 1 has positive 
eigenvalues. In particular, taking the trace yields the 
condition 



where 



(\r-r\ 2 ) >B/N tot , 



B = trjF" 1 } 



(17) 



(18) 



The mean squared error in our reconstruction is there- 
fore bounded by the quantity B, which we will hence- 
forth refer to as 'the Cramer-Rao bound', or the 'CRB' 
for short. It has a clear operational meaning as the best 
mean squared error achievable by our tomography exper- 
iment (scaled by ATtot)- Furthermore, as the simulations 
in Section IVIII demonstrate, the CRB represents a tight 
bound — in the limit of a large number of measurements, 
the achieved precision (i.e. mean-squared error) for un- 
biased tomography is well-described by B/N tot . 



III. NUMERICAL OPTIMIZATION 



F 



N, 



<AA T ) 



(12) 



where (.) indicates an ensemble average and the symbol 
T denotes the transpose. The vector A = V r £ is the 
gradient of the log-likelihood function 



C = lap (n\r) = ^ n ai lnp Qn 



(13) 



a 7 



One might expect that 'informative' measurements are 
those associated with a greater sensitivity of p to r, and 
indeed with these definitions, a strong r-dcpcndcncc of p 
contributes to the magnitude of the Fisher information 
F. Substituting Eq. © into Eq. (T3|). we obtain A = 



a7 



/pa-y, so that 



F 



N tt 



EE 

«7 05 



trips) 



PayP/38 



(14) 



There are no correlations between different measurement 
configurations; within the same configuration, we can use 
the standard result for multinomial distributions, so we 
have 

(n ai nps) = S 7 s [N 7 (iV 7 - 1) p ai pp5 + N 1 5 a f 3 p ai \ . (15) 



We are now ready to give a more formal rendering of 
the problem at hand. Our aim in finding the OED is to 
discover the numbers {N^} which minimize B, subject 
to the constraint in Eq. (fT0|) . Of course the N 7 must be 
integers, since one cannot perform an experiment a frac- 
tional number of times! But the solution of this problem 
is intractable, being combinatorial in nature. We will 
instead consider what the authors of [l[ called the re- 
laxed problem, where we allow 'fractional experiments'. 
We define the real, positive M-dimcnsional vector A with 



elements A™ such that 



E A ^ 



(19) 



The A's are the 'weights' representing the experiment de- 
sign, so that in the limit ./Vtot — ► oo of a large number 
of measurements, we have 



(20) 



The experiment design A is only asymptotically correct, 
but as our simulations in Section fVIII show. the optimiza- 
tion of A produces results which are beneficial for finite 
data sets with a wide range of sizes: one simply rounds 



4 



iV 7 . The Fisher information becomes 



F 



E ^7 T 
a ai a al . 

Q 7 ^ 



(21) 



the right hand side of (f20j) to the nearest integer to obtain 7 th sub-block, and zeros everywhere else. To proceed 

with the optimization, we first pick an initial 'guess' for 
A. We then perform a quick line-search optimization to 
find the value of 77 that minimizes J, given our guess. We 
then feed A and rj into lsqnonlin. 

In Figure Q] below, we plot the OED predicted using 
the above procedure, for a model polarimctric experiment 
introduced by Kosut et al. [l[ (see Section 2.4, and in par- 
ticular the top panel of Figure 3 therein). We will not 
discuss the various parameters involved; we simply com- 
ment that the agreement between our results and those in 
[l[ is excellent. The advantage of the method presented 
here is that no dedicated convex optimization packages 
are required. The optimization is therefore easier to im- 
plement; only standard numerical tools (any 'conjugate 
gradients' algorithm will perform well) are required. Fur- 
thermore, the provision of the Hessian makes for very 
rapid convergence. In Section [Vj we will see that this 
numerical method can also be employed to find the av- 
erage OED. 



In similar expressions were derived, and a numeri- 
cal convex optimization routine was invoked to find the 
vector A which minimized B = t r , subject to the 

normalization constraint in Eq. (|19[) . But this problem 
does not require specialized convex optimization soft- 
ware. The optimization can be performed quickly using 
standard gradient-ascent algorithms, since the constraint 
in Eq. (|19|) can be incorporated using a Lagrange multi- 
plier. We define the cost function 



J = B 



(x>4 



(22) 



where 77 is a real Lagrange multiplier that imposes the 
normalization constraint on A. The OED is the vector 
A such that the cost function is rendered stationary, 



Va'/| a=a oed — 0. 



(23) 



To proceed further, we need to differentiate B. This is 
most easily done by re-writing the Fisher information as 
a product of matrices, as follows. 



F = A T AP- 1 A, 
where P = diag (p) and 



A = diag 



(24) 



(25) 
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FIG. 1: The OED for a simple polarization tomography ex- 
periment (described in [J|). This result is to be compared 
with the top-panel of Figure 3 in Q. The optimization we 
use is simple and fast, requiring no specialist software. 



are both n to t X "tot diagonal matrices, and where A was 
defined in Eq. ([9]). Here we used t m to denote the Tri- 
dimensional column vector whose elements are all equal 
to 1. With the expression in Eq. (12~H) in hand, it is easy 



(26) 



to differentiate the cost function [41J. We obtain 



Va J = ijl M - diag {tr a [P^AF^A 1 ] } , 



where (tr Q [-X - ]) s = ^ Q X a ~ ( a g indicates a partial trace 
over the measurement outcomes. The OED is found by 
minimizing the norm of V^J, which can be done effi- 
ciently using, for example, Matlab's lsqnonlin routine. 
Convergence is accelerated by providing the algorithm 
with an expression for the Hessian H of J — the matrix 
containing the gradients of Va-/: 



where 



Hs-f = d\ 5 d\ y J 

= tr [(c^A) P^AKsA 7 ] , 

K 8 = {F,F- 1 A T P- 1 (3 A5 A) AF' 1 } , 



(27) 



(28) 



with {., .} denoting the anti-commutator. The matrix 
d\ A contains the vector 1„ along the diagonal of its 



IV. MINIMAL OED 

In certain circumstances, wc can do better than the 
above numerical optimization: we can write down an an- 
alytic expression for the OED. This is possible if we are 
able to invert the Fisher information matrix F 'by hand'. 
The form of Eq. (|24[) is suggestive. If we can take the in- 
verse of the matrices A, P^ 1 and A, we can multiply 
them to form F . But in general, A is not invertible. 
First, because only square matrices have inverses, and it 
may be that n to t 7^ N 2 — 1, and second, because even if A 
is square, it must be of full rank, in order to be invertible. 
That is, none of its rows can be linearly dependent upon 
any of the others. But the rows of A are given by the oJ r 
which are always linearly dependent upon each other be- 
cause of the second condition in Eq. $5$ — ultimately 
because of the conservation of probability. This latter 
issue is potentially fatal, but fortunately it is possible 
to re-write the Fisher information in terms of full-rank 
matrices. To see how, note that from Eq. (J5J), we can 
re-express the last POVM element in each configuration 



5 



in terms of the other elements, 



"7 

ol'=1 



(29) 



where n 



7 



y — 1 is the number of independent measure- 
ment outcomes associated with configuration 7. In sub- 
sequent calculations we will always use primed indices, 
such as a! , to enumerate these independent outcomes 
(i.e. omitting the last outcome with a = n 7 ). Substitut- 
ing Eq. (J29]) into Eq. (plj). we obtain 



7 v Q/ q, < 3 ' / 

(30) 

Some re-definitions then afford a matrix decomposition 
for F: 



F = A T A (p- 1 +Q- 1 T^J A, 



(31) 



where A is essentially the same as A, except that the last 
POVM element a„ 7 has been removed for each configu- 
ration. The matrix A is therefore of size ntot x (N 2 — 1), 
where ntot = X) 7 % = n tot — M is the total number 
of independent measurement outcomes. Similarly, the 
ntot x ntot diagonal matrix P = diag (jj) is formed from 
the vector p of probabilities remaining after the last prob- 
ability Pn^-y for each configuration has been removed from 

p. The ntot x "tot diagonal matrix A is defined accord- 
ingly as 



A = diag 



Ail~ i5 . . . , A 7 1~^, 



(32) 



this reason, we refer to the situation n to t = N 2 — 1 as 
'minimal tomography'. There are good reasons why over- 
determined tomography is advantageous, since this intro- 
duces redundancy which suppresses the effects of statisti- 
cal noise [3!|. However, when attempting tomography of 
high-dimensional systems, the number of measurements 
required can become large, and in this case minimal to- 
mography, involving the fewest number of measurements 
possible, is appealing. Below we present an exact an- 
alytic result for the OED for minimal tomography on 
arbitrarily large systems. 

We start by explicitly inverting F in order to obtain 
the CRB. Define 



K = AA T 



(35) 



as the symmetric matrix whose elements are the scalar 
products between the POVM elements, K a ' 7i /S's = 
CLa'-y-Q-p'S- Then we can write 



B = triA^R- 1 



P 



p)}, 



(36) 



where in the second line we have inverted P _1 + 
block- wise using the rank-1 update to a matrix inverse 
[4l| . introducing the block-diagonal matrix P whose 

7 



th sub-block is equal to p 7 p 7 , where p 7 is the 



dimensional vector of probabilities for all but the last 
outcome of the 7 th measurement. 

After a little algebra, we can re- write this result in the 
following way, 



B 



E0 Q ' 7 



(37) 



Q is the ntot x ntot 'complementary' matrix to P. which 
contains the probabilities we removed from P, 



Q = diag 



Pnt-ltk V - 



„ (33) 

Finally, the matrix II appearing in Eq. (|3ip is the ntot x 
n to t block-diagonal matrix whose 7 th sub-block is equal 
to ls T l~ T - 

If the matrix A is square, it should now be invert- 
ible (unless we are unlucky and it is rank-deficient for 
some other reason, unrelated to the normalization of the 
POVM elements). To make A square, we must have that 



ntot 



N 



1. 



(34) 



That is, we require the total number of independent 
measurement outcomes to equal the number of indepen- 
dent real parameters specifying the quantum state. If 
"■tot < Af 2 — 1, tomographic inversion is not possible 
(this is intuitively obvious, but the Fisher information 
is also singular in this case). If n to t > V 2 — 1, the quan- 
tum state is over-determined by our measurements. For 



where the t> Q ' 7 are the elements of the ntot-dimensional 
vector 



b = p* (d — Dp) , 



(38) 



with D an n to t x ntot block-diagonal matrix whose blocks 
are given by the diagonal blocks of K~ : , and with the 
vector d given by 



d = diag(L>) = diag (K~ 



(39) 



Here the symbol * indicates the Hadamard product, i.e. 
element-wise multiplication. 

To find the OED, we differentiate the cost function J in 
Eq. (f2"2"| . Armed with the expression (f3"T)) for B, we can 
do this analytically, and we arrive at the result (omitting 
an unimportant normalization factor) 



A 



OED 

7 



'7- 



(40) 



For the special case of binary measurements (such as 
'click'/'no-click' photon counters), the formula becomes 
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especially simple, since binary POVMs have only a sin- 
gle independent outcome each. The matrix D becomes 
purely diagonal, and we can write 



^OED:binary _ 



^d y py(l-p y ), (41) 
where we have dropped the redundant index a' . 



V. AVERAGE OED 

So far we have considered the problem of OED when 
the true state is known — both the optimization in Sec- 
tion IIIII and the analytic expressions of the previous sec- 
tion require knowledge of r in order that the statistics p 
are fixed. What if we really have no information about 
the true state? This is surely when tomography is most 
useful, but we cannot then calculate the OED. In this 
section we will introduce the average OED, which does 
not require any knowledge of r. The idea is a simple one: 
where there is a dependence on r, we average over the 
space of all possible states, to account for our complete 
uncertainty about the true state. We will introduce two 
approaches. In the first, we average the Fisher informa- 
tion before minimizing the resulting CRB. In the second, 
we calculate the CRB analytically, and then perform the 
average. It seems that these two approaches produce es- 
sentially the same results, which is perhaps natural (but 
not obvious!). Using the first approach, one can calculate 
the average OED using a numerical optimization similar 
to the one described in Section ITTT1 The second approach 
yields an analytic result for the average OED, but of 
course it can only be applied to minimal tomography. 

Both approaches, however, are exact only for qubits. 
The reason is that the averaging requires an integral over 
the space of physical states. This space is the (TV 2 — 1)- 
dimensional volume such that all states r within it cor- 
respond to positive density matrices. Evaluating the 
boundaries of this region is non-trivial, and so in gen- 
eral the averaging cannot be performed exactly. But it 
is known that the space of physical states is a simply- 
connected convex region, whose boundary lies between 
two concentric (N 2 — 2)-dimensional hypersphercs whose 
radii are given respectively by [4(| 



Rn 



y/N(N-l) 



and i? n 



N — 1 



N 



(42) 



For the case of qubit tomography, with N = 2, these hy- 
perspheres coincide, with i? m i n = i? max = R2 = l/v2) 
the radius of the well-known Bloch sphere on which all 
pure qubit states lie. For higher dimensional systems, we 
will continue to approximate the space of physical states 
as being bounded by a hypersphere of radius Rn, where 
Rn must lie somewhere in between i? m i n and R max . If, at 
the end of the calculation, we find that the average OED 
does not depend (or depends only weakly) on our choice 
for R n , then we can be confident that we have closely ap- 
proximated the 'true' average OED. In practice, we have 



found this to be the case, so that our averaging procedure 
— crude though it is — produces useful results. 



A. Averaging the Fisher information 

The definition of the Fisher information in Eq. (JT2J) 
involves the evaluation of an ensemble average. For cal- 
culating the OED, we assumed that every member of 
our ensemble was prepared in the same state, and so we 
averaged over the statistical distribution of the data n, 
assuming a given state r. If we consider that, in fact, 
the prepared state is itself drawn from a large ensemble 
of possibilities, then we should extend our calculation of 
the expectation by averaging our result over those pos- 
sibilities. With no prior information at all, we should 
average over the space of all physical states. The Fisher 
information obtained by averaging in this way is 



(F) = A T AGA, 

where the elements of the diagonal matrix G 
are given by 



(43) 



(P- 



9a*y 



Pir) 

Pa~f(r) 



dV, 



(44) 



with p the probability distribution from which the states 
are drawn. In what follows we consider a uniform dis- 
tribution p = const, and take the integral to run over 
the (iV 2 — l)-dimensional volume enclosed by a hyper- 
sphere of radius Rn centred at the origin. For N = 2 the 
integral in Eq. (j44|) can be evaluated analytically to give 



+2c ai R2\a a ~ 



R 2 2 \a ai \ 2 ) In 



-a 7 



(45) 



The generalization to higher dimensions can be found 
recursively, 

9aj = 1 r x {In 2 -3 (c Q7 , \a>cey\) - In 2 -3 (caj,—\a a -y\)} , 

(46) 

where the integral I n (a, b) = J x n In (a + bx) dx satis- 
fies the recurrence relation 

R n 

(n + l)I n (a,b) = {(a + bR N ) [In (a + bR N )-l} + a} 
b 

~R n N +1 ^rr^n^i n ^(a,b), (47) 



n + 1 b 



with I (a,b) = \{a + bR N ) [In (o + bR N ) - 1] - 
% [ln(a) — 1]. Once the elements g ai of G are known, 
we can apply the numerical method described in Sec- 
tion IIIII to find the average OED, which is the experi- 
ment design A^ OED ^ that minimizes the associated CRB 
(B) = tr ((_F) _1 ). The only difference is that the matrix 



7 



P- 1 in Eqs. ([26]). (f27j) and (J28]) should be replaced with 
G. 

In Figure [2] below, we show the result of such a numer- 
ical optimization, for the same set of POVMs as were 
used to generate Figure Q] in Section UTT1 (the model is de- 
scribed in [l| ) . It is notable that the two figures look very 
different: the OED of Figure Q] is optimal for a particular 
state (a pure state in this case), while the average OED 
has been constructed so as to be optimal for any state, 
or more precisely, optimal when we do not know which 
state has been prepared. 



For the case of binary measurements, we drop the index 
a' , and we have / 7 = <7 7 . The formula in Eq. (|50p then 
vanishes, 



_1 

9n 



1 



(h, 



0, 



but the term in brackets cancels when the A 7 are nor- 
malized, so we arrive at the simple expression 



\ (OED:binary) 
A 7 



\Jd~tl9~i- 



(53) 
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FIG. 2: The average OED computed numerically from the 
averaged Fisher information. The set of POVMs is the same 
as those used to generate Figure [1] The marked difference 
between the two experiment designs shows how ignorance of 
the true state changes the optimization. 



If we are interested in minimal tomography (so that 
Eq. (|34p is satisfied), we can write down an analytic for- 
mula for the average OED. The averaged Fisher informa- 
tion is 



(F) = A T A(^P- 1 ■+ 
Inverting this gives the associated CRB, 

(ba'-y) 



(B) 



£ 

a'7 



A, 



where the averaging procedure yields 

(b) =g- 1 *(d-f*Dg~ 1 ) 



(48) 



(49) 



(50) 



with g 1 the element-wise inverse of the vector g, whose 
elements are the g a '-y, defined in Eqs. (|45|) and (|46[) above. 
The tilde symbol indicates, as usual, that the element 
associated with the last outcome of each POVM has been 
removed. The n to t-dimensional vector / is defined as 
follows, 



fa'-f 



1 



9 Pi 



for all a'. 



(51) 



Differentiation of the appropriate cost function then 
yields the formula for the average OED (again omitting 
a normalization factor), 



'7/ 



(52) 



B. Averaging the CRB directly 

We now describe a second method for computing the 
average OED. Our approach here is to average the an- 
alytic expression in Eq. <|3T|) for the CRB over a hyper- 
sphere — our approximation to the space of all physical 
states. Since the analytic expression only holds for mini- 
mal tomography, when Eq. (f3~4")l is satisfied, this method 
is applicable only in this situation. To distinguish this 
from the procedure outlined above, where we averaged 
F, we will use two angle-brackets to denote this type of 
average: 



(B) 



q ; 7 



A 



7 



Performing this averaging yields the result 

((b)) =c*(d-Dc)- ((x 2 ))dmg (DK) , 



(54) 



(55) 



where 



stands for the integral of a Cartesian coordi- 



nate x 2 over the hypcrsphcrc, divided by the hypersphcrc 
volume. For qubits, with N = 2, we have ((x 2 )) = 1/10, 
and in general we find [42], EH 



«z 2 )> 



R 2 N (N 2 -1)V(^) 



2{N 2 



i)r(^±! 



(56) 



where T(.) is the Eulcr Gamma function. Differentiat- 
ing the appropriate cost function leads to the following 
analytic formula for the average OED, 



(57) 



For the case of binary measurements, the formula reduces 
to 



A ((OED:bina ry) ) = ^ [ Cj (l - C,) - (( X 2 ))\a^ 



(58) 



As a simple application of the these approaches to cal- 
culating the average OED, consider the particular case 
of binary measurements chosen so that the POVM ele- 
ments arc mutually orthogonal in the Bloch representa- 



tion, with a^.as 



for some constant a. Then the 
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matrix K of scalar products is proportional to the iden- 
tity matrix, and inverting it gives d 1 = 1 /a 2 for all mea- 
surements. Suppose in addition that the measurements 
are symmetric in the sense that the 'identity components' 
associated with the two outcomes of each measurement 
are the same, c 7 = 1 - c 7 = 1/2. Then Eqs. ([58]) and 
([55]) give 



A «OED:bina ry » = ^ J_ _ ^ 

= constant, 

and A< OED:binar ^ = 1 

\Ja [l N 2_ 3 (±,a) - I N 2_ 3 (±,-a)] 

= constant. (59) 

That is, both approaches to computing the average OED 
predict a uniform distribution. This is to be expected, 
because the measurements just described are the most 
symmetric binary measurements possible. For N = 2, 
they constitute projective measurements chosen from the 
three mutually unbiased bases (MUBs [34|) for a qubit. 
It is already known that these MUB measurements are 
optimal in the sense that, given a uniform distribution 
for the experiment design, and no prior knowledge of 
the state, they provide the best tomographic precision 
[33l ]. The above result provides an alternative perspec- 
tive: MUB measurements are those for which the best 
tomographic precision, without prior knowledge of the 
state, is achieved using a uniform experiment design. 

In Figure [31 we plot the average OED A« OED » pre- 
dicted by Eq. ([57)) . alongside the prediction A^ OED ^ of 
the method presented previously (Eq. ([5"2"1) ~). for the case 
of minimal qubit tomography using a randomly chosen 
set of 3 binary POVMs. Note that these results are ex- 
act, since the state space is exactly spherical for a qubit. 
What is notable is that the two predictions appear to 
coincide — although the formulae are different, the two 
averaging procedures seem to result in identical, or very 
similar, experiment designs. Conceptually, minimizing 
the averaged CRB is slightly different to minimizing the 
CRB associated to the averaged Fisher information, so it 
is not immediately obvious why this is so. Empirically, 
however, we have found that both methods generate the 
same results. 



VI. ODT 

The method just presented has an obvious generaliza- 
tion: instead of choosing A so as to minimize the averaged 
Cramer- Rao bound ((B)) , we could find an expression for 
((SB 2 )), the variance of B over the state space, and try to 
find the vector A that minimizes this variance. Such an 
experiment design makes the tomographic reconstruction 
as 'fair' as possible, since the reconstruction should have 
as close as possible to a uniform precision for all states. 
We call this experiment design the optimal design for to- 




FIG. 3: Comparison of the two methods for calculating the 
average OED. (a): the set of 3 binary POVMs used in the 
qubit tomography set-up. Arrows represent the vectors a al 
on the Bloch sphere, (b): On the left is the average OED pre- 
dicted by averaging the Fisher information (see Eq. ([52}); on 
the right is the corresponding result predicted by averaging 
the CRB (see Eq. (|57[) h The two results are indistinguishable. 
This is typical: sometimes differences between the two meth- 
ods are discernible, but generally they produce very similar 
experiment designs. 



mography, or ODT for short. Using the expression in 
Eq. ([37]) . we can write the variance of B as 

((SB 2 )) = ((B 2 )) ((B)) 2 

= A- 1T VA-\ (60) 

where A -1 is the clement-wise inverse of A, and where the 
real, symmetric (and therefore positive definite) M x M 
variance matrix V is given by 

V jS = ^((^ 7 V5))-((V 7 ))((V5)) 
a'P' 

= y]v a '-y,p>5- (61) 
a'/3' 

Performing the average explicitly requires a stout heart; 
the result is 

v a 'y,0'S = ({x ))d a '^d/3'sK a ' 7i /s's 

-2((x 2 )) ^ Ca'ydpigWa'yjApiSj 
3 

Cot' j 

3 

+ ((.* 4 ))^AV 7J A> 5j 
3 

+ ((x 2 y 2 ))^2x a , 1 . J X , s , k 

+2((x 2 y 2 )) ^ A a ^ d Ap l s,jW a ><y,kWp>s,k 

-((x 2 ))J2x a , jd Xp, 5ik , (62) 

where we have defined the matrices W = DA and X = 
A * W , and where the roman indices j, k number the 
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Cartesian coordinates of the state space, and run from 1 



to N — 1. The average 



is defined in Eq. (f5o) and 



the other averages are given by |42|, [42 



«*v» 



((* 4 » 



r%(n 2 - i)r 



N 2 -l 



A(N 2 + 3)r (^^t3 

3«zV»- 



(63) 



We can now find the experiment design that minimizes 
Eq. (|6"0|) . Setting the derivative of the cost function J = 

((SB 2 )) + r] ( ^2 A 7 — 1) to zero, we obtain the condition 



(64) 



VX- 1 =r?A 2 , 



where A 2 is a column vector whose elements are given by 
A 2 . ft is not clear how to solve Eq. analytically, but 
it is not hard to find the solution A ODT numerically — 
almost any optimization algorithm will suffice. 



A. ODT or average OED? 

Both ODT and average OED are reasonable candi- 
dates for an experiment design that does not rely on prior 
knowledge of the true state p. They are both the results 
of a minimization, but their objective functions are dif- 
ferent, so how do they compare? To contrast the two 
optimizations, we used the analytic results in Eqs. ([M]) 
and (|6TJ)) to evaluate the objective functions ((B)) and 
((SB 2 )) using many sets of N 2 — I randomly chosen bi- 
nary POVMs, for a range of dimensionalities N = 2- 
fO. Operationally, the quantities \J ((B)) and ((SB 2 )) 1 / 4 
are more intuitive, having natural interpretations as dis- 
tances in the Bloch representation. The former is the 
bound on the root-mean-squared (r.m.s.) error in the re- 
constructed state r — the typical value of |r — r\; the 
latter is the square root of the standard deviation of the 
CRB, which can be thought of as the range over which 
the r.m.s. error \ f — r\ varies. 

In Figure 3] below, we show in the top two panels how 
the average OED and the ODT improve (i.e. reduce) 
the averages of these quantities (averaged over 3000 sets 
of random POVMs) when compared against the results 
generated with a uniform experiment design A = 1m /M. 
The improvements are a ll at t he level of around 10%, for 
both the r.m.s. error y/ ((B)) when using A^ OED ^ (part 
(a)), and the 'deviation' ((SB 2 )) 1 / 4 when using A ODT 
(part (b)). Improvements at this level are certainly signif- 
icant enough to motivate spending the time to calculate 
the appropriate optimal design. 

On the other hand, the lower two panels show how 
the two designs A^ OED ^ and A ODT perform in terms of 
the other's objective function. Part (c) shows that the 
average OED only improves the r.m.s. error by around 
1% over that achieved by the ODT. Similarly part (d) 
shows that the ODT, on average, reduces the deviation 



((SB 2 )) 1 / 4 by around 1% with respect to the deviation 
produced by using the average OED. And the differences 
fall away as the dimension N increases. It seems that 
there is actually not much to choose between the ODT 
and the average OED in terms of performance. This is 
good news: although there is a tradeoff between precision 
and fairness, it is small in the sense that optimizing for 
one only reduces the other by around a percent or less. 

These results are not definitive, because we cannot in- 
tegrate over the true state space — only our hypcrsphcr- 
ical approximations to it — but it is clear from Figure 0] 
that the choice made for Rn has little influence on our 
conclusion, suggesting that it should hold for the cor- 
rectly averaged quantities too. 



Average OED 



ODT 




23456789 10 
N 



FIG. 4: Comparison of ODT and average OED. The top two 
panels show how (a) average OED and (b) ODT compare 
with a uniform design A 7 = 1/M. In part (a), the relative 
reduction in the average of the r.m.s. error y/^Bjj afforded 
by using A^ OED ^ instead of a uniform design, averaged over 
3000 sets of randomly generated POVMs, is plotted against 
the dimension N of the quantum systems: the red bars result 
from setting Rn = Rmin', the blue from setting Rn = -Rmax- 
In part (b), we show the relative percentage reduction in the 
average of the 'deviation' ((SB 2 )) 1 / 4 afforded by using A ODT 
instead of a uniform design. The lower two plots compare 
ODT and average OED with each other. In part (c), we 
plot the relative reduction in the average r.m.s. error arising 
from using the average OED instead of the ODT. In part 
(d), the relative reduction in the deviation given by choosing 
the ODT over the average OED is shown. Note that the 
averages ((.)) over the state space are exact for qubits, for 
which -Rmin = iimax, so the discrepancy between the red and 
blue bars is entirely statistical for N = 2; the results for higher 
dimensions should be interpreted with this in mind. 



VII. MONTE CARLO SIMULATIONS 

So far we have presented a number of mathemati- 
cal results involving approximate averages of asymptotic 
bounds. The reader could be forgiven for doubting the 
utility of these results in the laboratory. Short of per- 
forming real experiments — the ideal proving ground — 
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we cannot do better than applying our techniques to sim- 
ulated data, so we have performed some Monte-Carlo 
simulations of minimal qubit tomography that provide 
some insight into when the CRB is saturated, and how 
much is gained by implementing an optimized experiment 
design. 

In these simulations, we generate a set of qubit states 
distributed throughout the Bloch sphere, as shown in 
part (a) of Figure [5l The states are chosen so that it 
is easy to calculate averages over the Bloch sphere by 
polynomial interpolation: we use Chebyshev grids for the 
radial and polar coordinates and an equally spaced grid 
for the azimuth; averages over the sphere can then be 
computed accurately using Clenshaw-Curtis quadrature 
radially and polar-wise, and Fourier interpolation around 
the equator [44| . Using 6 points for each coordinate yields 
averages that are accurate to within about 1% with rea- 
sonable computing times. 




FIG. 5: (a) The 216 sample states used to construct averages 
over the Bloch sphere using polynomial interpolation. The 
points are distributed evenly around the equator, but Cheby- 
shev grids are used for the polar and radial distributions — 
this explains the clustering of points towards the poles and 
the centre of the Bloch sphere, (b) The POVM elements used 
in the Monte-Carlo simulations presented below. 



Figure [6] shows the mean squared errors predicted by 
the CRB alongside the errors actually achieved using sim- 
ulated data equivalent to 2000 experimental runs, each 
consisting of AT to t = 1000 measurements. We chose a 
minimal set of binary POVMs randomly (shown in part 
(b) of Figure O , and then performed two simulations — 
one with a uniform experiment design, and one using the 
average OED predicted by Eq. (|57|) . Results are plotted 
for three different methods of state reconstruction, which 
we call inversion, least squares and maximum likelihood 

am. 



Reconstruction methods 



Here the notation A~ stands for the Moore-Penrose 
pseudo-inverse of A [4(| H3, EH , which exists even though 
A is generally rectangular (and therefore has no true in- 
verse). In Matlab we use the backslash operator [49l |. 
which computes f directly by Gaussian elimination. A 
problem with direct inversion is that sometimes the es- 
timate r lies outside the Bloch sphere, producing an 
unphysical reconstructed density matrix. In the least 
squares method, we remove any of these unphysical esti- 
mates by using the 'nearest' (in the least squares sense) 
physical state. This is particularly easy for the qubit 
states we consider here, since the state space is spher- 
ical: whenever the norm of r exceeds i?2 = l/v^, we 
re-normalize it, 



i? 2 x — . 



(66) 



The maximum likelihood method is a more nuanced ap- 
proach to the same problem 0, 0, [EH, HH ■ The ra- 
tionale is to try to find the physical state which is most 
likely to have produced the observed (simulated) data. 
This is the state r that maximizes the likelihood function 
p(n\r) defined in Eq. (fTTj) . or — more conveniently - 
the state that renders the corresponding log-likelihood C 
(see Eq. (p~3|) ) stationary with respect to changes in the 
estimated density matrix p. A simple iterative scheme 
that converges on this state, while including the posi- 
tivity and trace constraints on p, has been derived by 
Hradil. Starting with an unphysical state f generated by 
inversion, we first normalize it as per the least squares 
method. We then construct the corresponding density 
matrix p using Eq. |TJ, and then make the replacement 



pR(P)], 



(67) 



where the matrix R, which depends on p through the 
probabilities p ai = tr{H aj p}, is given by pi l5l| 



(68) 



We repeat this procedure with the updated estimate, and 
its associated operator R, until the algorithm converges. 
The resulting state is guaranteed to be physical, where 
the initial state was not. In the cases that inversion 
produces a physical estimate to start with, we accept 
it without applying the above procedure, since it can be 
shown that under these circumstances inversion already 
produces the most likely estimate pj]. 



Inversion is the simplest: wc first generate an estimate 
p of the 'true' statistics p using the relative frequencies of 
each measurement outcome in the simulated data {n aj }, 
n n 

Eq. " 



by setting p ai = n a ~ ( /N-y. We then substitute this into 
and solve for the Bloch vector, 



A- (p - c) . 



(65) 



B. Results 

As is clear from the plots in parts (c) and (d) in Fig- 
ure [6l the CRB describes the errors achieved by inversion 
extremely well (in both plots, the two lines are nearly 
indistinguishable, save for statistical fluctuations). Com- 
paring the two plots, it is also clear that the average 
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FIG. 6: Monte-Carlo results. Parts (a) and (b) show the 
variation of the CRB over the Bloch sphere for the uniform 
design and the average OED, respectively. In part (c), the 
mean squared errors achieved by the three methods inver- 
sion, least squares and maximum likelihood are plotted for 
the 216 sample states, along with the CRB (dashed line), for 
a uniform experiment design (inset). Part (d) shows the same 
results, except that the average OED was used instead (in- 
set). The plots have an oscillatory appearance because they 
are produced by 'unwrapping' the Bloch sphere, with the sam- 
ple states from part (a) of figure ([5]) arranged side by side. 



OED reduces the inversion errors significantly over those 
achieved using a uniform design. Integrating over the 
Bloch sphere, we find for this example that the average 
r.m.s. error is reduced by ~ 11% by optimizing the de- 
sign, in agreement with the predictions shown in Figure^] 
in the previous section. 

However, the precision of the other two reconstruction 
methods is not well described by the CRB. The least 
squares method performs significantly better than inver- 
sion, while the maximum likelihood method leaves both 
other methods standing, achieving mean squared errors 
nearly two orders of magnitude smaller than inversion 
for some states. The reason why these methods perform 
better is clear: knowledge of the boundaries of the space 
of physical states is utilized to improve the tomographic 
reconstruction. The least squares method is too crude 
to take full advantage of this constraint, but the maxi- 
mum likelihood method exploits it to impressive effect. 
The positivity constraint on the density matrix is not 
present anywhere in our derivation of the CRB, and this 
is why an estimation method which implements this con- 
straint is able to beat the 'lower bound' on the average 
errors represented by the CRB. These simulations high- 
light the fact that the CRB as presented in this paper 
applies rigorously only to state tomography via direct 
linear inversion of the measured statistics. 

Does this limit the applicability of our results relating 
to optimal experiment design? In the above example, use 



of the average OED only improves the average r.m.s. er- 
ror achieved by the least squares method over a uniform 
design by around 7%, while the maximum likelihood er- 
rors are hardly affected at all — the improvement is less 
than 1%. In the next section, we will consider an alterna- 
tive formulation of the CRB that includes the positivity 
constraint, and which therefore correctly describes the 
precision achieved by the maximum likelihood method. 
We will see that its optimization results in very similar 
experiment designs to those presented above, from which 
we conclude that our results remain close to optimal even 
when used with estimation methods that enforce positiv- 
ity. 



VIII. CONSTRAINED ESTIMATORS 

It is well-known in classical statistics that the bias and 
variance of an estimator are complementary. The pre- 
cision of an estimate can be improved using some prior 
knowledge of the estimated quantities, but inevitably this 
biases the estimate, shifting the mean of the estimator 
away from the 'true' value [23|, [53[. The least squares 
and maximum likelihood methods are biased estimation 
techniques, because they incorporate the positivity con- 
straint that excludes unphysical states, but at the same 
time they are more precise than direct inversion — which 
is unbiased — and they are able to beat the unbiased 
CRB. 

It is useful to visualize the true quantum state as a 
point in the Bloch sphere, surrounded by a spherical 
'bubble' that represents isotropic statistical fluctuations. 
If the true state lies close to the boundary of the Bloch 
sphere, the bubble may extend into the region of un- 
physical states. A constrained estimation method such as 
maximum likelihood will exclude these unphysical states, 
which distorts the shape of the error bubble. This re- 
duces its size, improving the tomographic precision, but 
it is clear that it also introduces bias, since the centre 
of mass of the bubble no longer coincides with the true 
state. 

The CRB can be modified to yield a bound on the 
precision of biased estimators, but it requires an analytic 
expression for the gradient of the bias as a function of the 
true state [H, [54| . This depends on the particular esti- 
mator being used, which removes some of the generality 
of the expression; in the case of the maximum likelihood 
method, it is not possible to write down such an expres- 
sion in any case, since the estimator itself is only defined 
implicitly, as the fixed point of the iteration described in 
Eq. (ffj?)) [231 ]. In general, constructing CRBs for estima- 
tors biased by inequality constraints — such as positivity 
— is a hard problem [251 ] . and so it is not obvious how 
to adapt the foregoing analysis to maximum likelihood 
tomography. 

In this section, we show that an alternative parame- 
terization of the density matrix allows us to replace the 
inequality constraint of positivity with the equality con- 
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straint of unit trace. The CRB for estimators with equal- 
ity constraints is a much more tractable problem, and we 
will derive the appropriate constrained CRB (CCRB), 
which correctly describes maximum likelihood tomogra- 
phy. 



A. The Cholesky representation 

Instead of using the Bloch representation of p, we will 
use its Cholesky decomposition [45l 150. l55j . That is, we 
write p in the form 



P 



(69) 



where T is a unique [60j upper triangular matrix with real 
elements along its main diagonal. The positivity of p is 
built-in to this parameterization, because the eigenvalues 
of p are given by the squares of the singular values of T. 
Clearly the Hermiticity of p is also guaranteed. However 
the trace condition must be added separately (compare 
this with the Bloch representation, in which the trace 
condition and Hermiticity arc 'automatic', and positivity 
must be imposed separately), 



tr {T f T} = 1. 



(70) 



The problem of quantum state tomography now reduces 
to finding the N 2 real numbers that parameterize the 
matrix T, subject to the equality constraint in Eq. ([70|) . 
We define the Cholesky vector 8 as the column vector 
comprising these numbers, by analogy with the Bloch 
vector r. To construct 8, we first define 



t = 



vec (3? (Tt 
vec (3? {Tt 



(71) 



as the (2iV 2 )-dimensional column vector formed by vec- 
torizing first the real part, and then the imaginary part 
of T\ and concatenating the two. We then eliminate all 
the redundant elements of t, which contain zeros because 
of the structure of T (that is, because T is upper trian- 
gular with a real diagonal — 'accidental' zeros we keep). 
This gives us 8. With these definitions, the trace condi- 
tion becomes simply \t\ 2 = 1, or equivalently (since we 
only remove zero elements to produce 8), 



\e\ 2 = i. 



(72) 



That is, the Cholesky vector is of unit length, so that 
physical states are restricted to lying on an (TV 2 — 1)- 
sphere in what we might call 'Cholesky space'. To de- 
rive the CCRB — the Cramer-Rao bound that incorpo- 
rates this equality constraint — we need to evaluate the 
Fisher information using the Cholesky parameterization, 
which requires differentiation of the measurement statis- 
tics with respect to 8. Substituting Eq. (|69|) in to Eq. (|3|), 
we find 



Pa 



tr-jrn^rt} 



(73) 



In terms of t, we have 



Pet" 



t T P t 



(74) 



where the 2iV 2 x 27V 2 matrices P ai are generated from 
the POVM elements H aj according to the relation 



P aj = M x 



/«) n 



a 7 



/<g>n 



a 7 



x M~ 



(75) 



with / the N x N identity matrix and M the matrix that 
separates real and imaginary parts, defined by [56j, 157J 



M = 



1 1 

— i i 



In 2 , 



(76) 



where I N 2 = I g) I is the N 2 x TV 2 identity matrix. Fi- 
nally, we arrive at the following bilinear expression for 
the statistics in terms of 



Pay — 8 Qay8, 



(77) 



where the N 2 x N 2 matrices Q aj are formed from the 
Pay by deleting rows and columns with indices given by 
those of the elements of t that are deleted to produce 8. 
The Q ai mirror the properties of the POVM elements 
themselves, being real, symmetric matrices that sum to 
the identity operator, 



Qay — Qay i ^ ' Qa-f I. 



N 2 - 



(78) 



Differentiating the log-likelihood function with respect to 
8, we find the Fisher information in the Cholesky repre- 
sentation to be 



F = iY,—Qa~f09 T Q c 

Pa 7 



Q(7 

= Z T AP- 1 Z, 



(79) 



where the diagonal matrices A and P are as defined in 
Section HTT1 and where the rows of the n to t X N 2 matrix 
Z are given by the row vectors 



'a7 



28 T Q ai 



(80) 



Note that unlike the matrix A appearing in the Bloch 
representation (see Eq. (|2"4"|)). Z depends on the state 
8, because the state enters the measurement statistics 
quadratically, rather than linearly, in the Cholesky rep- 
resentation. This more complicated dependence on the 
state makes further manipulations, such as averaging 
over the state space, more involved than in the Bloch 
representation. But we can find the CCRB, and we can 
find the associated OED numerically 



B. The constrained Cramer-Rao bound 

In general, a set of n equality constraints on 8 can be 
written as s(8) = 0, where s is an n-dimcnsional column 
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vector of constraint functions. Let the derivative of s 
with respect to 9 T be the n x N 2 matrix S. The CCRB 
is then given by [12, [H, [13] 



B c = tr {u(U T FU) X ?7 T } 



(81) 



where {/ is the unitary matrix whose columns span the 
null space of 5, 

SU = 0. (82) 
Since U T U = Ijv 2 > the cyclic property of the trace gives 



B c = 



(83) 



In our case, this formula produces an extremely simple 
result. We have just a single equality constraint, so that 
s = s = \9\ 2 — 1. The matrix S then collapses to the 
row vector 29 T , and then U can be any matrix whose 
columns span the (N 2 — l)-dimensional subspace that is 
orthogonal to 9. Now, multiplying out the product F9 
reveals that F6 = 49, so that 9 is always an eigenvec- 
tor of the Fisher matrix with eigenvalue 4. Since F is 
Hermitian, its N 2 — 1 remaining eigenvectors Uj are all 
orthogonal to 9, and to each other, and we can write 



F = 499 T + Y J F J u, ] u], 

3=2 



(84) 



where the Fj are the eigenvalues of F associated with the 
e 

and then wc have 



Uj. Now, we are free to take the Uj as the columns of U 



N 



U T FU = 4U T 99 T U + Y^FjUuju]U, 

(F 2 \ 

F 3 



\ 



Inverting this diagonal matrix is trivial, and so we obtain 
the final expression for the CCRB 



N 



F< 

=2 3 



(85) 



That is, up to an additive constant, the CCRB is the 
same as the unconstrained CRB in the Cholcsky repre- 
sentation. 

In Figure [7J we show the results of another Monte- 
Carlo simulation, in which we compare the mean squared 
errors (|0 - 9\ 2 ) predicted by the CCRB with those pro- 
duced by maximum likelihood estimation, and by the 
least squares method, for the same set of qubit states as 
depicted in part (a) of Figure \5\ The CCRB describes 
the errors achieved by both methods well, although the 
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FIG. 7: The mean squared error (\0 — 6\ 2 ) achieved by the 
maximum likelihood and least squares methods is plotted 
alongside the CCRB for the same set of sample states as used 
to generate Figure [5] In these Monte-Carlo simulations, we 
chose a random minimal set of binary qubit POVMs, and a 
uniform experiment design. We then averaged the errors over 
2000 virtual experiments, involving 10 5 measurements each 
— it seems that convergence on the CCRB requires larger 
samples than for the CRB in the Bloch representation. The 
precision attained by both constrained estimation methods is 
clearly well-described by the CCRB, although it is discernible 
that maximum likelihood performs slightly better, as might 
be expected. For some states, the CCRB seems to 'blow up', 
predicting much worse performance than actually achieved. 
These states are close to the pure state boundary, on which 
the Fisher matrix in the Cholesky representation becomes sin- 
gular. 



least squares method is slightly less precise. It is not pos- 
sible to compare the errors produced by inversion, since 
non-positive density matrices cannot be represented with 
a Cholesky vector. 

For pure states, the Cholesky matrix T contains just 
a single non-zero diagonal element, so that the Cholesky 
vector 9 contains zero elements. The determinant of the 
outer product 99 T then vanishes, rendering the Fisher 
information singular. This explains the divergences of 
the CCRB for pure states; of course the precision actually 
achieved remains finite! Numerical failure in computing 
the CCRB is easily avoided by slightly shifting the pure 
states away from the boundary of the Bloch sphere. 



C. Experiment designs for constrained estimators 

The CCRB can be used to find the OED for con- 
strained estimators. The analytic formulae from Section 
IIVI do not quite carry over to the Cholesky representa- 
tion, but the numerical method described in Section IIIII 
can be applied directly. Eq. ([79]) for the Fisher infor- 
mation in the Cholcsky representation has precisely the 
same structure as Eq. (|24[) for F in the Bloch representa- 
tion. Ignoring the constant shift of j in Eq. (jSSJ) , finding 
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the OED simply requires that we minimize the trace of 
the inverse of F. To find the solution, we implement 
exactly the same numerical algorithm as for the Bloch 
representation, the only difference being that we replace 
the matrix A with Z in the Eqs. ([26 ]) — ([28] ) . 

To examine the difference between these two optimiza- 
tions — the first appropriate to unbiased tomography 
and the second designed for constrained estimators - 
we would like to compare the average OEDs predicted by 
the two calculations. Unfortunately we cannot use any of 
the techniques described previously to evaluate the aver- 
age OED in the Cholesky representation, since the Fisher 
information defies attempts to average or invert it ana- 
lytically. But it is possible to arrive at the average OED 
— or at least an average OED — by brute force. First, 
one calculates the OED for each of the sample states rj 
shown in part (a) of Figure so as to obtain A d (t"j)- 
Next, one uses polynomial interpolation to numerically 
approximate the average of A OED over the Bloch sphere, 
yielding the design 

A <OED:brute> = _J_ f A OED (r) ^ (g6) 

which is literally the average OED. This method is only 
feasible for qubit states, where a direct numerical aver- 
age can be performed quickly by polynomial interpola- 
tion with a few sample points. When this brute force 
method is implemented in the Bloch representation, we 
have found that the resulting distributions seem to coin- 
cide very closely with the designs A^ OED ^ and A^ OED ^ 
introduced in Section [Vj this is why we still refer to 
A (OED:bruto) ag the average OED. 

We compared the average OEDs for the Bloch and 
Cholesky representations using 1000 randomly chosen 
minimal sets of binary qubit POVMs. For each POVM 
set, we calculated the 'discrepancy' V as the sum of the 
absolute difference between the two designs, 

— _ I x(OED:brutc) _ , (OED:brutc) 

U ~ | A 7, Bloch A 7, Cholesky • \° ' ) 

7 

Since by definition the A's sum to unity, one would ex- 
pect an average discrepancy of order 1, when compar- 
ing two completely unrelated distributions. We found 
(D) ~ 0.0418(7) for the average discrepancy, which shows 
that the optimal design for reconstruction by inversion 
is very close to the optimal design for maximum likeli- 
hood tomography. Although we are not able to compare 
the ODTs for the two parameterizations (because the 
Cholesky representation is too unwieldy), this suggests 
that the results we derived in the Bloch representation 
for unbiased estimators are useful for constrained estima- 
tors too, even though the Bloch representation does not 
account for the constraints. 

Despite this, the precision of maximum likelihood es- 
timation seems to improve less than the precision of in- 
version upon adoption of the average OED (whether de- 
rived using either the Bloch or the Cholesky representa- 
tion), indicating that this reconstruction technique is less 



amenable to optimization generally. It seems that maxi- 
mum likelihood estimation is able to recruit the positivity 
constraint to counteract the statistical errors that a poor 
experiment design exacerbates. After our foray into the 
Cholesky representation, we are able to conclude that the 
smaller improvements in maximum likelihood tomogra- 
phy are 'intrinsic' to the method, and are not due to the 
neglect of the positivity constraint when optimizing the 
design in the Bloch representation. Of course, if pre- 
cision matters, it is always better to use an optimized 
design over a uniform one. 



IX. CONCLUSION 

We have revisited the problem of optimal experiment 
design for quantum state tomography, first introduced 
in |l|. We have shown that specialist convex optimiza- 
tion software is not necessary to calculate the OED, and 
further that analytic results exist for minimal tomogra- 
phy, in which the measurements do not overdeterminc 
the state. The reliance of the OED on knowledge of the 
true state is rather paradoxical, but we have explored a 
number of averaging methods that remove this state de- 
pendence. The averaging relics on approximating the 
state space as hyperspherical (an exact procedure for 
qubits), but the results are generally insensitive to the 
radius chosen for the hypersphere, indicating that this 
approximation is reasonably robust. 

We introduced the average OED, which optimizes the 
average precision, or the 'precision on average', depend- 
ing on the details of the method — the resulting designs 
appear to be the same. We have also considered the 
ODT, which seeks to render a tomographic experiment 
maximally fair. Fortunately, it appears that the average 
OED and the ODT are rather similar, so that fair mea- 
surements are generally precise. Finally, we confirmed 
that the formalism correctly describes the achieved pre- 
cision for unbiased state estimation techniques, but that 
constrained estimators appear to perform better than 
predicted. However a new formulation of the problem al- 
lows the treatment of these constrained estimators, and 
shows that designs optimal for unbiased estimators are 
also very close to optimal for constrained methods. 

Although this paper contains a rather large number of 
results, it seems that many of the various optimal de- 
signs we have proposed are effectively interchangeable. 
One can use the ODT, or any one of three methods for 
computing the average OED (if one includes the brute 
force method for qubits), or one can use the Cholesky 
representation. Numerical evidence suggests there is lit- 
tle to choose between them. This is good news for ex- 
perimentalists who would like an answer to the question 
"What experiment design is optimal in quantum tomog- 
raphy, when I don't already know what state I expect?" 
After all is said and done, probably the most useful re- 
sponse is "The numerical method for computing the av- 
erage OED described in Section IV Al " This method is 
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fast, and works for any number of measurements on a 
system of arbitrary dimension [6l| . 

Further work would help to justify some of the claims 
we have made, particularly regarding the applicability of 
our averaging method in higher dimensions. An inter- 
esting possibility is to apply these techniques in exam- 
ining the effectiveness of a detector; perhaps they can 
be used to design better measurements under laboratory 
constraints. In any case, as quantum tomography be- 
comes increasingly indispensable, we hope that our re- 
sults will prove useful in real experiments. 
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